Method and apparatus for motion compensated temporal interpolation of video sequences

ABSTRACT

Method for encoding a digital video stream, comprising the steps of encoding a video sequence into a full frame sequence, forming a decimated frame sequence by removing a predetermined number of frames from the full frame sequence by means of temporal decimation, locally decoding the full frame sequence, locally decoding the decimated frame sequence, temporally interpolating the decoded decimated frame sequence by means of an interpolator, comparing the locally decoded frames of the full frame sequence with the corresponding frames of the locally interpolated frame sequence, determining residual information for a frame based on at least the comparison for that frame, and providing an output stream comprising the decimated frame sequence and the determined residual information.

The invention relates to a method for encoding and decoding video data.When encoding a video signal to make it suitable for digital handling,such as transmission or storage, compression of the video data is usedto optimize the use of available bandwidth and storage capacity. Thegood compression results are obtained with lossy encoding, whereininformation of the original signal can not be fully recovered in thedecoding stage.

Although with lossy encoding good results can be obtained, it is anobject of the invention to provide an encoding method with which bettercompression results can be obtained. Better performance can lie in thatwith a similar compression rate or bandwidth, better decoded results areobtained or that similar decoded result is obtained with a bettercompression rate or smaller bandwidth. To obtain this objective, amethod for encoding a video signal is provided according to claim 1.

From a video stream to be encoded a decimated frame sequence is formedby removing a number of frames of the video stream. Then the decimatedframe sequence is temporally interpolated in order to make a goodestimation of the decimated (i.e. skipped) frames. Consecutively, areasof the skipped-estimated frames are detected in which the estimation isinadequate, in that it does not meet a predetermined standard. Bycomparing the in the encoder still available skipped frame with theskipped-estimated frames, these areas can be detected, and residualinformation can be determined. Only the decimated frame sequence and theresidual data for the detected areas will now be encoded, and insertedin an encoded bitstream. Preferably, the temporal interpolation isperformed on locally decoded encoded frames of the decimated framesequence in order to perform the temporal interpolation a frames thatare also available in a decoder.

An encoded bitstream is decoded according to the invention, byextracting the residual data from the main bitstream. Consequently, themain bitstream data is interpolated using a similar interpolatingprocess as used for the encoding. The residual data is then added to theinterpolated frame sequence.

By using the encoding/decoding system according to the invention, abetter quality/bandwidth ration can be obtained, because only relevantresidual data is incorporated into the encoded signal.

The invention further relates to a method for decoding, an encoder, adecoder, an audiovisual device, a data container device, a computerprogram and a data carrier device on which a computer program is stored.

Particularly advantageous elaborations of the invention are set forth inthe dependent claims. Further objects, elaborations, modifications,effects and details of the invention appear from the followingdescription, in which reference is made to the drawing, in which

FIG. 1 shows a flow diagram of an encoding method according to theinvention,

FIG. 2 shows a flow diagram of a decoding method according to theinvention for use in combination with the method of FIG. 1,

FIG. 3 shows a flow diagram of another encoding method according to theinvention,

FIG. 4 shows a flow diagram of a decoding method according to theinvention for use in combination with the method of FIG. 3,

FIG. 5 shows an example of an encoder according to the invention,

FIG. 6 shows an example of a decoder according to the invention.

FIG. 7 shows a block diagram of an example of an embodiment of anotherencoder according to the invention, and

FIG. 8 shows a block diagram of an example of an embodiment of anotherencoder according to the invention.

FIG. 9 shows a block diagram of an example of a video encoder which maybe used in example of an encoder according to the invention of FIG. 7.

In FIG. 1 a flow diagram is shown of an example of a encoding methodaccording to the invention. A video input signal 10 comprising a videosequence is supplied to an video encoder, in this example a MPEG encoder20. The encoder 20 codes the video signal in a specific digital format,in this example a MPEG format. The encoded signal consists of a sequenceof frames, such as an IPP sequence in MPEG. The encoder 20 performs atemporal decimation during the coding which means that a predeterminednumber of the frames are skipped or discarded. As an example the inputvideo signal is a 50 Hz signal, whereas the outputted main stream outputsignal 30 is a 12.5 Hz signal. The decimating factor is therefore 1 outof 4, meaning that from a sequence of 4 frames a single frame ismaintained. It should be noted that this encoding is a standard MPEGoperation. Furthermore, the decimating factor can be adjusted, to obtainthe required reduction in the data stream.

The encoder 20 also encodes a full encoded data stream, that is withoutdiscarding any frames due to temporal interpolation. This data stream issend to a decoder 30, suitable to decode the encoded data stream whichin this example means a MPEG decoder. The decoded data stream 35 is a 50Hz signal, as no frame where dropped in the encoding process. The datastream 35 is provided to an IP selector 40; the selector 40 performs thesame temporal elimination as the encoder 20 performs on the originalvideo input. The result is again a 12.5 Hz signal. This reduced signalis fed to a motion estimator 50, that is embodied in this example as anatural motion estimator. The estimator 50 performs a upconversion from12.5 Hz to 50 Hz by estimating additional frames. The estimator 50performs the same upconversion as later the decoder will perform whendecoding the coded data stream. Any motion estimation method can beemployed according to the invention. In particular good results can beobtained with motion estimation based on natural or true motionestimation as used in for example frame-rate conversion methods. A verycost efficient implemention is for example three-dimensional recursivesearch (3DRS) which is suitable for consumer applications, see forexample the U.S. Pat. Nos. 5,072,293, 5,148,269, and 5,212,548. Themotion-vectors estimated using 3DRS tend to be equal to the true motion,and the motion-vector field inhibits a high degree of spatial andtemporal consistency. Thus, the vector inconsistency is not thresholdedvery often and consequently, the amount of residual data transmitted isreduced compared to non-true motion estimations.

The upconverted signal 55 is send to an evaluation unit 60 (as indicatedwith a minus sign). To the evaluation unit also the full data stream 35is send (as indicated with a plus sign). The evaluation unit 60 comparesthe interpolated frames as determined by the motion estimator 50 withthe actual frames. From the comparison is determined where the estimatedframes differ from the actual frames. Differences in the respectiveframes are evaluated; in case the differences meet certain thresholds,the differential data is selected as residual data. The thresholds canfor example be related to noticeable the differences are; such thresholdcriteria per se are known in the art. In this example the residual datais described in the form of meta blocks. The residual data stream 120 inthe form of meta blocks is then put into a MPEG encoder 70. The residualdata can be encoded using a private data channel as is provided forwithin a MPEG environment.

Finally, the main stream of the data and the residual data stream arecombined by means of the multiplexer 80 to form a single output datastream 90. The output stream 90 can be transmitted (using for example a(wireless) data transmission connection) or stored or used otherwise.

In FIG. 2 a flow diagram of an example of a method according to theinvention to decode the data stream 90 is shown. First, the data stream90 is de-multiplexed in a demultiplexer 100 into the main data stream 30and the residual data stream 120. To this end, the demultiplexer isprogrammed to recognize the residual data stream enclosed in theincoming signal. In case a private data channel is used, thedemultiplexer extracts the residual data from the private data channelused. Both the main data stream 30 and the residual data stream aredecoded by means of a MPEG decoder, shown respectively in step 130 and140. The main stream decoded signal is forwarded to a motion estimator,in this example embodied as a natural motion estimator 150. The motionestimator 150, which as such is known in the art, interpolates the dataprovided, making a 50 Hz signal from the 12.5 Hz signal decoded in theprevious step. The upconverted 50 Hz signal is consecutively put forwardto an combiner 160.

Apart from the upconverted signal is also the decoded residual data fromthe decoder 140 forwarded to the combiner 160. The combiner 160 combinesthe information of the main data stream with the residual data stream.Such an operation per se is known in the art, and comprises replacinginformation, such as meta blocks, in the main data stream withrespective residual information, such as meta blocks. The output signalof the combiner 160 is a 50 Hz frame rate video data stream.

In case the decoder that receives the data stream 90 is not equipped todetect the residual data stream, the main stream only is decoded.Therefore a usable video signal can be decoded, even with a decoder thatis not fully compliant with the residual data signal. However, thedecoded signal is not as good as the signal obtainable with the residualdata correction.

The invention may be applied in various devices, for example a datatransmission device, like a radio transmitter or a computer networkrouter that includes input signal receiver means and transmitter meansfor transmitting a coded signal, such as an antenna or an optical fibre,may be provided with an image encoder device according to the inventionthat is connected to the input signal receiver means and the transmittermeans. Furthermore, a decoder according to the invention can beimplemented in for example a DVD recorder, and a PVR (HDD) recorder. Anencoding and decoding system according to the invention can beimplemented with for example internet video streaming services, andin-home (wireless) networks.

Good results can be obtained for a temporal decimation of 1 out of 2;typically less then 5-10% of the area of the skipped-estimated frames isdetected as in need of residual information. Also a decimation of 1 out4 frames yields good results. Even more frames can be skipped using theinvention for applications that do not require the highest imagequality.

The invention also relates to an encoder and a decoder for performingthe above illustrated coding and decoding methods. In FIG. 5 an exampleof an encoder according to the invention is shown. It comprises an inputsection 310 for receiving video data, connected to an encoder 320. Theencoder is connected to a multiplexer 330 and to a local decoder 340.The local decoder 340 is connected to a selector 350 and an evaluationunit 360. The selector 350 is connected via an estimator 370 to theselector 350. The selector 350 is connected to the multiplexer via anencoder 380. The multiplexer is connected with an output section 390.

In FIG. 6 an example of a decoder according to the invention is shown.The decoder comprises an input section 410 that is connected to ademultiplexer 420. The demuliplexer 420 is connected to a decoder 440and 430. Both decoders are connected to a combiner 460; the decoder 430is connected directly, whereas the decoder 440 is connected via anestimator 450. The estimator 460 is connected with an output section470.

In FIGS. 3 and 4 a second example of an encoding/decoding system isshown. Parts of the invention that correspond with elements from theexample embodiment shown above are noted with the same referencenumerals, and for a description of their function referred is to above.The second embodiment differs from the first embodiment in that in anadditional natural motion estimator is used in the decoding stage. Tothis end, two different types of temporal interpolators are used in theencoding stage in the encoder, a simple temporal interpolator and acomplex one. The decoder will now only have to use the simple (andrelatively cost effective) temporal interpolator. The complex temporalinterpolator (which is relatively costly) will only have to be employedin the encoder.

The encoding of the video stream is generally similar in both the firstand second embodiment. In the second embodiment (see FIG. 3) anadditional step 200 is introduced in which the information from theselector 40 is upconverted in a complex temporal interpolator, forexample of the natural motion type, to yield highly accurateinterpolations for the decimated frames. This high accuracy data is putforward to an evaluator 220.

Parallel with the high accuracy interpolation, the data is also suppliedto a simple temporal interpolator 210, of the type employed by theeventual decoder. The simple interpolator 210 yields a medium accuracydata stream that is provided to the above mentioned evaluator 220. Theevaluator 220 compares the high and medium accuracy interpolations andyields a corrected vector stream to the multiplexer to be included inthe residual information in for example the private data channel. Thevector stream is also provided to a combiner 230 that combines thevector data with the medium accuracy interpolation result of the simpletemporal interpolator 210. The combined signal is fed to the naturalmotion estimator 50′ that uses the information to adjust theinterpolated frames. The subsequent residual data determination issimilar to the first embodiment.

The resulting encoded data stream comprises the main stream data, theresidual data, and the correction vector information. The bandwidth usedis therefore slightly larger than in the first embodiment, but betterquality results are obtained.

In decoding, shown in FIG. 4, the incoming data is demultiplexed intothe main data stream, the residual stream (similar to the firstexample), and the vector data. The formation of the video output is donein similar fashion to the first embodiment, with the addition that thenatural motion estimation 150′ also includes the result from the mediumquality estimator 210′ which results are corrected in 230′ by thedecoded vectors. By using the additional medium quality estimation, theend result is significantly improved, even more so if the correctionvectors are used. The additional costs for obtaining the better qualityare relatively small, and consist of an extra simple motion estimationdevice and a slightly increase bandwidth. Furthermore, an additionalhigh quality estimator is required in the encoding step, but this onlymarginally increase the cost for the encoder.

In the examples of devices and methods described above, the residualdata stream is encoded or decoded using the same type of encoding ordecoding as the main data stream. It is likewise possible to encode ordecode the residual data using a different type of encoding or decoding.For example, the encoding or decoding of the residual data stream may bespecifically adapted to the residual data. In that case, a moreefficient encoding may be obtained compared to using the same encodingor decoding for both the main data stream and the residual data stream.The increase of coding efficiency may for example be caused by thedifference in correlation between the residual data and the main data,since in general there will be less correlation between consecutiveframes in the residual data stream then between consecutive frames inthe main data stream.

The encoder for the residual data may be some special or proprietarycoding scheme, which may take into account the characteristics of thevisual content of the residual data stream. For example, scatterednon-empty blocks in the residual data could first be clustered in alarger group.

FIGS. 7 and 8 show block diagrams of an example of an encoder anddecoder resp. in which the residual data and the main data areinterleaved during coding.

The encoder of FIG. 7 comprises an input section 510 for receiving videodata, connected to an video encoder 520, for example a MPEG encoder. Thevideo encoder 520 is connected to a multiplexer 530 and to a localdecoder 540. The local decoder 540 is connected to a selector 550 and anevaluation unit 560. The selector 550 is connected via an estimator 570to the evaluation unit 560. The evaluation unit 560 is connected to theencoder 520. The multiplexer 530 is connected to or has an outputsection 590.

The video encoder 520 codes the video signal in a specific digitalformat, in this example a MPEG format. The encoder 520 also provides afull encoded data stream, that is without discarding any frames due totemporal interpolation. This data stream is sent to a decoder 540,suitable to decode the encoded data stream. In this example the decoder540 is a MPEG decoder. The decoded data stream 535 is a 50 Hz signal, asno frames were dropped in the encoding process. The data stream 535 isprovided to an IP selector 550; the selector 540 performs the sametemporal elimination as the encoder 520 performs on the original videoinput. The result is again a 12.5 Hz signal. This reduced signal is fedto a motion estimator 570, that is embodied in this example as a naturalmotion estimator.

The estimator 570 performs a upconversion from 12.5 Hz to 50 Hz byestimating additional frames. The estimator 570 performs the sameupconversion as the decoder will perform when decoding the coded datastream. In this example, the estimator 570 is a natural motionestimator. The upconverted signal 555 is send to an evaluation unit 560(as indicated with a minus sign). To the evaluation unit 560 also thefull data stream 535 is sent (as indicated with a plus sign). Theevaluation unit 560 compares the interpolated frames as determined bythe motion estimator 570 with the actual frames. From the comparison isdetermined where the estimated frames differ from the actual frames. Thecomparison may for example consist of checking the difference betweenthe estimated frame and the actual frame against predeterminedcriterions.

The differences in the respective frames are evaluated; in case thedifferences meet certain thresholds reformat code is transmitted by theevaluation unit 560 to the video encoder 520 which indicates how theencoder should rebuild the respective frame. When the estimated framesare similar to the actual frames, the evaluation unit 560 transmits askip code to the video encoder 520. The video encoder 520 interleavesthe data from the evaluation unit 560 with the main data during coding.Thereby high coding efficiency is be achieved while the same components,e.g. the MPEG-2 coder and decoder, are used to encode or decode both theresidual data and the main data. Furthermore, the actual frames and theskip code may easily be detected.

FIG. 9 shows an example of an implementation of the video encoder 520.In FIG. 9, the video encoder comprises a encoder device 524, which isconnected to a post-processer device, such as for example a Tri-Media®device. The post-processor device comprises variable length encoders521,522 which are connected via a reformat device 523. The reformatdevice 523 is connected to the evaluation unit 560 to receivereformatting instructions. The encoder 524 is also connected to theinput of the video encoder 520. The encoder 524 encodes a full encodeddata stream without discarding any frames, i.e. the second encoderencodes the input without temporal decimation. This data stream istransmitted to the local decoder 540 which is able to decode the fullencoded data stream.

In FIG. 8 an example of a decoder according to the invention is shown.The decoder comprises an input section 610 that is connected to a videodecoder 630. The video decoder 630 is connected with an output to aselector 640. The selector 640 is directly connected to an overwriter660. The selector 640 is also connected to an estimator 650. Theestimator 650 is connected to the overwriter 660. The overwriter 660 isconnected to an output section 670.

The video decoder 630 may decode an encoded data stream and isspecifically suited to decode a date stream encoded with the encoder ofFIG. 7. The resulting decoded data stream is then transmitted by thevideo decoder 630 to the selector 640. The selector performs a temporaldecimation which corresponds to the temporal decimation of thedown-conversion by selector 550 in the encoder of FIG. 7. The decimateddata is transmitted by the selector 640 to the estimator 650. Theoverwriter device decides based on information from the decoder 630whether to use the data from the estimator 640 or the data which hasbeen dropped by the selector 640.

When the encoder and/or decoder of FIG. 7-9 are MPEG compliant, the skipcode may be a skip macro block code as is provided in the MPEG standard.Such a skip macro block code may also be used in other encoder types,since most video coding standards provide a skip code.

Furthermore, a coded block pattern (cbp) code, as is known from section8.4.5. of Haskell et all., “Digital video; an introduction to MPEG-2”,Kluwer, 1997, may be used. Such a CBP indicates which blocks in amacro-block are empty, that is in MPEG: which block have all zerodiscrete cosine transforms. Thereby, if only a part of a macro block ora frame is to be replaced with the actual frame or (macro-) block, theother parts may be indicated with the CBP whereby the amount of data isreduced.

If the invention is used in a MPEG context, an efficient choice for thecoding of the base frames (i.e. the decimated data stream) is IPP-frameencoding; for the skipped frames B-frame coding is an effective choice,however other choices could be made as well.

In an advantageous embodiment, the full frame video sequence is obtainedby temporally interpolating a relatively low frame rate video sequencesuch as a 24 Hz progressive movie sequence by a further interpolator ofhigher quality or accuracy than the interpolator used for interpolatingthe decimated frame sequence, the further interpolator being e.g. theabove described complex temporal interpolator or complex natural motion,or a higher accuracy 2-3 pull down algorithm. The further interpolatoris preferably a non-real time, offline interpolator. By using, in aboveembodiments, a higher quality further interpolator for interpolating arelatively low frame rate movie sequence, a movie temporal enhancementlayer is created. In a decoder, the movie temporal enhancement layer isused in order to obtain decoded video with reduced movie judder. Thedecimation of the full frame video sequence can be performed efficientlyby taking the low frame rate video sequence as the decimated videosequence directly. The movie temporal enhancement layer can also becombined with a spatial enhancement layer such that a backwardscompatible bitstream is created with a spatial and temporal enhancementlayer for improved video quality.

The invention is not limited to implementation in the disclosed examplesof physical devices, but can likewise be applied in another device. Inparticular, the invention is not limited to physical devices but canalso be applied in logical devices of a more abstract kind or insoftware performing the device functions. Furthermore, the devices maybe physically distributed over a number of apparatuses, while logicallyregarded as a single device. Also, devices logically regarded asseparate devices may be integrated in a single physical device.

The invention may also be implemented in a computer program for runningon a computer system, at least including code portions for performingsteps of a method according to the invention when run on a computersystem or enabling a general propose computer system to performfunctions of a computer system according to the invention. Such acomputer program may be provided on a data carrier, such as a CD-rom ordiskette, stored with data loadable in a memory of a computer system,the data representing the computer program. The data carrier may furtherbe a data connection, such as a telephone cable or a wireless connectiontransmitting signals representing a computer program according to theinvention.

1. A method for encoding a digital video stream, comprising the steps ofproviding a full frame video sequence, forming a decimated framesequence by removing a number of frames from the full frame sequence bymeans of temporal decimation, temporally interpolating the decimatedframe sequence by means of an interpolator, comparing the frames of thefull frame sequence with the corresponding frames of the temporallyinterpolated frame sequence, determining residual information for aframe based on at least the comparison for that frame, and providing anoutput stream comprising the decimated frame sequence and the determinedresidual information.
 2. A method as claimed in claim 1, wherein thedecimated frame sequence is compressively encoded.
 3. A method accordingto claim 1, wherein the residual information is encoded in the form ofin data blocks.
 4. A method according to claim 1, wherein the residualinformation is encoded in a private data channel.
 5. A method accordingto any claim 1, wherein the temporal interpolation is performed by meansof natural or true motion.
 6. A method according to claim 1, wherein thepredetermined number of frames is 1 out of
 2. 7. A method according toclaim 1, wherein the number of frames is 1 out of
 4. 8. A method asclaimed in claim 1, wherein the temporal interpolation is assisted bydata other than the decimated frame sequence, such as motion vectors. 9.A method as claimed in claim 1, wherein the full frame video sequence isobtained by temporally interpolating a relatively low frame rate videosequence by means of a further interpolator of higher quality oraccuracy than the interpolator used for temporally interpolating thedecimated frame sequence.
 10. A method as claimed in claim 9, whereinthe decimated frame sequence is formed by the low frame rate videosequence directly rather than by removing a number of frames from thefull frame sequence.
 11. A method for decoding a data stream encodedaccording to claim 1, comprising separating from encoded data stream thedecimated frame sequence and the determined residual information,decoding the decimated frame sequence, temporally interpolating thedecoded decimated frame sequence by means of a similar interpolatingprocess used for the encoding, decoding the residual information, andcombining the residual information and the interpolated frame sequenceto form an output data stream.
 12. An encoder for encoding digital videodata, provided with an input section for providing a full frame videosequence, means for forming a decimated frame sequence by removing anumber of frames from a full frame sequence received from the inputsection by means of temporal decimation, interpolation means fortemporally interpolating the decimated frame sequence by means of aninterpolator, comparator means for comparing the frames of the fullframe sequence with the corresponding frames of the temporallyinterpolated frame sequence and for determining residual information fora frame based on at least the comparison for that frame, and an outputsection for providing an output stream comprising the decimated framesequence and the determined residual information.
 13. A decoder fordecoding digital video data with an input section and an output section,provided with a decoding section arranged to perform decoding accordingto claim
 11. 14. An audiovisual device, comprising data input means,audiovisual output means and a decoder device as claimed in claim 13.15. A data container device containing data representing an outputstream obtained with a method as claimed in claim
 1. 16. A computerprogram including code portions for performing steps of a method asclaimed in claim
 1. 17. A data carrier device including datarepresenting a computer program as claimed in claim
 16. 18. A video datastream comprising a decimated frame sequence and residual informationrelating to the decimated frame sequence, the residual information beingbased on the comparison for a respective frame of a by means of aninterpolator temporally interpolated frame based on the decimated framesequence, and the corresponding respective frame of the full framesequence.